{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "2eec5cc39a59"
   },
   "outputs": [],
   "source": [
    "# Copyright 2024 Google LLC\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "accbd23b5877"
   },
   "source": [
    "<h1 align=\"center\"> <a href=\"../README.md\">Vertex AI: Gemini Evaluations Playbook </a><br>\n",
    "Optimize with grid search of experiments</h1>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ec924642b9dd"
   },
   "source": [
    "<table align=\"left\">\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini_evals_playbook_gridsearch-from_notebook-colab&utm_medium=aRT-clicks&utm_campaign=gemini_evals_playbook_gridsearch-from_notebook-colab&destination=gemini_evals_playbook_gridsearch-from_notebook-colab&url=https%3A%2F%2Fcolab.sandbox.google.com%2Fgithub%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fblob%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fevals_playbook%2Fnotebooks%2F2_gemini_evals_playbook_gridsearch.ipynb\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Run in Colab\n",
    "    </a>\n",
    "  </td>\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini_evals_playbook_gridsearch-from_notebook-colab_ent&utm_medium=aRT-clicks&utm_campaign=gemini_evals_playbook_gridsearch-from_notebook-colab_ent&destination=gemini_evals_playbook_gridsearch-from_notebook-colab_ent&url=https%3A%2F%2Fconsole.cloud.google.com%2Fvertex-ai%2Fcolab%2Fimport%2Fhttps%3A%252F%252Fraw.githubusercontent.com%252FGoogleCloudPlatform%252Fapplied-ai-engineering-samples%252Fmain%252Fgenai-on-vertex-ai%252Fgemini%252Fevals_playbook%252Fnotebooks%252F2_gemini_evals_playbook_gridsearch.ipynb\">\n",
    "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Run in Colab Enterprise\n",
    "    </a>\n",
    "  </td>    \n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini_evals_playbook_gridsearch-from_notebook-github&utm_medium=aRT-clicks&utm_campaign=gemini_evals_playbook_gridsearch-from_notebook-github&destination=gemini_evals_playbook_gridsearch-from_notebook-github&url=https%3A%2F%2Fgithub.com%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fblob%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fevals_playbook%2Fnotebooks%2F2_gemini_evals_playbook_gridsearch.ipynb\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n",
    "    </a>\n",
    "  </td>\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini_evals_playbook_gridsearch-from_notebook-vai_workbench&utm_medium=aRT-clicks&utm_campaign=gemini_evals_playbook_gridsearch-from_notebook-vai_workbench&destination=gemini_evals_playbook_gridsearch-from_notebook-vai_workbench&url=https%3A%2F%2Fconsole.cloud.google.com%2Fvertex-ai%2Fworkbench%2Fdeploy-notebook%3Fdownload_url%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fblob%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fevals_playbook%2Fnotebooks%2F2_gemini_evals_playbook_gridsearch.ipynb\">\n",
    "      <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
    "    </a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "73deb521dfd3"
   },
   "source": [
    "# Evals Playbook: Optimize with grid search of experiments\n",
    "\n",
    "This notebook shows you systematically exploring different experiment configurations  by testing various prompt templates or model settings (like temperature), or combinations of these using a grid-search style approach. The notebook performs following steps:\n",
    "\n",
    "- Define the evaluation task\n",
    "- Prepare evaluation dataset\n",
    "- Define an experiment by:\n",
    "    - Configuring the model\n",
    "    - Setting prompt and system instruction\n",
    "    - Establishing evaluation criteria (metrics)\n",
    "- Run evaluations using [Vertex AI Rapid Eval SDK](https://cloud.google.com/vertex-ai/generative-ai/docs/models/rapid-evaluation)\n",
    "- Log detailed results and summarizing through aggregated metrics.\n",
    "- Side-by-side comparison of evaluation runs for a comprehensive analysis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "a1f5b2e429ad"
   },
   "source": [
    "## 🚧 0. Pre-requisites\n",
    "\n",
    "Make sure that you have completed the initial setup process using [0_gemini_evals_playbook_setup.ipynb](0_gemini_evals_playbook_setup.ipynb). If the 0_gemini_evals_playbook_setup notebook has been run successfully, the following are set up:\n",
    "\n",
    "* GCP project and APIs to run the eval pipeline\n",
    "\n",
    "* All the required IAM permissions\n",
    "\n",
    "* Environment to run the notebooks\n",
    "\n",
    "* Bigquery datasets and tables to track evaluation results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "6bc4f990ca5b"
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "edb768ddbc75"
   },
   "source": [
    "### Read configurations\n",
    "\n",
    "The configuration saved previously in [0_gemini_evals_playbook_setup.ipynb](0_gemini_evals_playbook_setup.ipynb) will be used for initializing variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "465feb3e9138"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "\n",
    "module_path = os.path.abspath(os.path.join(\"..\"))\n",
    "sys.path.append(module_path)\n",
    "print(f\"module_path: {module_path}\")\n",
    "\n",
    "# Import all the parameters\n",
    "from utils.config import (LOCATION, PROJECT_ID, STAGING_BUCKET,\n",
    "                          STAGING_BUCKET_URI)\n",
    "from utils.evals_playbook import Evals, generate_uuid"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "e236046a4da7"
   },
   "source": [
    "### Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "9aad01600617"
   },
   "outputs": [],
   "source": [
    "import datetime\n",
    "import itertools\n",
    "import re\n",
    "\n",
    "import pandas as pd\n",
    "import vertexai\n",
    "from datasets import Dataset, load_dataset\n",
    "from vertexai.evaluation import EvalTask, constants\n",
    "from vertexai.generative_models import (GenerativeModel, HarmBlockThreshold,\n",
    "                                        HarmCategory, SafetySetting)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d02af5cfe722"
   },
   "source": [
    "### Initialize Vertex AI SDK"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "2c5cb3f27bd2"
   },
   "outputs": [],
   "source": [
    "vertexai.init(project=PROJECT_ID, location=LOCATION, staging_bucket=STAGING_BUCKET_URI)\n",
    "\n",
    "print(\"Vertex AI SDK initialized.\")\n",
    "print(f\"Vertex AI SDK version = {vertexai.__version__}\")\n",
    "\n",
    "# pandas display full column values\n",
    "pd.set_option(\"display.max_colwidth\", None)\n",
    "pd.set_option(\"display.max_rows\", None)\n",
    "pd.set_option(\"display.max_columns\", None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "39ddfb1c94b2"
   },
   "source": [
    "### Define `Evals` object\n",
    "\n",
    "[`Evals`](../utils/evals_playbook.py) is a helper class helps to define tasks, experiments and log evaluation results. Define an instance of `Evals` class to use in the rest of the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "148797e76e59"
   },
   "outputs": [],
   "source": [
    "# Define eval object\n",
    "evals = Evals()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "c2b2ed5dca34"
   },
   "source": [
    "## 🛠️ 1. Configure parameter grid to run experiments"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "aa452e253930"
   },
   "source": [
    "### Define exploration space as grid\n",
    "\n",
    "Define a dictionary with parameters names (str) as keys such as prompt template or temperature. For each key, specify a list of settings to try as values, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings. This is similar to defining [grid search](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) in ML."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "b0608100c858"
   },
   "outputs": [],
   "source": [
    "param_grid = {\n",
    "    \"prompt\": [  # Format: (prompt_id, prompt_description, prompt_template)\n",
    "        (\n",
    "            \"prompt_template_1\",\n",
    "            \"Single Sentence\",\n",
    "            \"Summarize this PubMed article: {context}\",\n",
    "        ),\n",
    "        (\"prompt_template_2\", \"Structured\", \"Article: {context}. Summary:\"),\n",
    "    ],\n",
    "    \"temperature\": [0.0, 0.1, 0.2],\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "e89f57d7797c"
   },
   "source": [
    "### Configure Model\n",
    "\n",
    "Define the Gemini model you want to evaluate your task on including name, configuration settings such as temperature and safety settings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "74bacad514ce"
   },
   "outputs": [],
   "source": [
    "system_instruction = \"\"\"Instruction: You are a medical researcher writing a plain language Summary of your Article for a layperson.\n",
    "\n",
    "Translate any medical terms to simple english explanations.\n",
    "Use first-person 'We'.  Use short bullet points addressing following\n",
    "- Purpose: What was the purpose of the study?\n",
    "- Research: What did the researchers do?\n",
    "- Findings: What did they find?\n",
    "- Implications: What does this mean for me?\"\n",
    "\"\"\"\n",
    "\n",
    "#\n",
    "safety_settings = [\n",
    "    SafetySetting(\n",
    "        category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,\n",
    "        threshold=HarmBlockThreshold.BLOCK_NONE,\n",
    "    ),\n",
    "    SafetySetting(\n",
    "        category=HarmCategory.HARM_CATEGORY_HATE_SPEECH,\n",
    "        threshold=HarmBlockThreshold.BLOCK_NONE,\n",
    "    ),\n",
    "    SafetySetting(\n",
    "        category=HarmCategory.HARM_CATEGORY_HARASSMENT,\n",
    "        threshold=HarmBlockThreshold.BLOCK_NONE,\n",
    "    ),\n",
    "    SafetySetting(\n",
    "        category=HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,\n",
    "        threshold=HarmBlockThreshold.BLOCK_NONE,\n",
    "    ),\n",
    "]\n",
    "\n",
    "#\n",
    "model_name = \"gemini-2.0-flash-001\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4c6793294550"
   },
   "source": [
    "### Configure Metrics\n",
    "\n",
    "In this section, you configure the evaluation criteria for your task. You can choose from the [built-in metrics (or metric bundles)](https://cloud.google.com/vertex-ai/generative-ai/docs/models/rapid-evaluation#metric-bundles) from Vertex AI Rapid Eval SDK or define a custom metric."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "5d277b5b33fd"
   },
   "outputs": [],
   "source": [
    "#\n",
    "metrics = [\n",
    "    constants.Metric.ROUGE_1,\n",
    "    constants.Metric.ROUGE_L_SUM,\n",
    "    constants.Metric.BLEU,\n",
    "    constants.Metric.FLUENCY,\n",
    "    constants.Metric.COHERENCE,\n",
    "    constants.Metric.SAFETY,\n",
    "    constants.Metric.GROUNDEDNESS,\n",
    "    constants.Metric.SUMMARIZATION_QUALITY,\n",
    "]\n",
    "\n",
    "# build a metric config object for tracking\n",
    "metric_config = [\n",
    "    {\"metric_name\": metric, \"type\": \"prebuilt\", \"metric_scorer\": \"Vertex AI\"}\n",
    "    for metric in metrics\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9efb424fe602"
   },
   "source": [
    "### Prepare evaluation dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "b26615ca9dfc"
   },
   "outputs": [],
   "source": [
    "# Prompt Template\n",
    "prompt_template = \"Article: {context} \\nSummary:\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "f9aed69678c6"
   },
   "outputs": [],
   "source": [
    "from google.cloud import storage\n",
    "\n",
    "# # OPTION 1:\n",
    "# # Load prepared dataset from GCS\n",
    "# # Path to your CSV file in GCS\n",
    "# file_name = \"pubmed_summary.csv\"\n",
    "# file_path = f\"gs://{STAGING_BUCKET}/{file_name}\"\n",
    "\n",
    "# # Read the CSV file into pandas DataFrame\n",
    "# eval_dataset = pd.read_csv(file_path)\n",
    "\n",
    "\n",
    "# OPTION 2:\n",
    "# Load and prepare public dataset from HuggingFace\n",
    "ds_stream = load_dataset(\n",
    "    \"ccdv/pubmed-summarization\", \"document\", split=\"test\", streaming=True\n",
    ")\n",
    "num_rows = 10\n",
    "dataset = Dataset.from_list(list(itertools.islice(ds_stream, num_rows)))\n",
    "\n",
    "# convert HuggingFace dataset to Pandas dataframe\n",
    "eval_dataset = dataset.to_pandas()\n",
    "# rename columns as per Vertex AI Rapid Eval SDK defaults\n",
    "eval_dataset.columns = [\"context\", \"reference\"]\n",
    "# add instruction for calculating metrics (not all metrics need instruction)\n",
    "eval_dataset[\"instruction\"] = system_instruction\n",
    "# add prompt column\n",
    "eval_dataset[\"prompt\"] = eval_dataset[\"context\"].apply(\n",
    "    lambda x: prompt_template.format(context=x)\n",
    ")\n",
    "# add prompt id for tracking\n",
    "eval_dataset[\"dataset_row_id\"] = [f\"dataset_row_{i}\" for i in eval_dataset.index]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bcd5bee8c3a4"
   },
   "source": [
    "### Define Evaluation task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "16a3e25561df"
   },
   "outputs": [],
   "source": [
    "# create and log task\n",
    "task_id = \"task_summarization\"\n",
    "task = evals.Task(\n",
    "    task_id=task_id,\n",
    "    task_desc=\"summarize pubmed articles\",\n",
    "    create_datetime=datetime.datetime.now(),\n",
    "    update_datetime=datetime.datetime.now(),\n",
    "    tags=[\"pubmed\"],\n",
    ")\n",
    "evals.log_task(task)\n",
    "\n",
    "#\n",
    "evals.get_all_tasks()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "a9c05eba57f0"
   },
   "source": [
    "## ⏳ 2. Run experiments on the grid"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "48cb56edc7b0"
   },
   "outputs": [],
   "source": [
    "# Note thar this cell can take time to finish!\n",
    "from sklearn.model_selection import ParameterGrid\n",
    "\n",
    "grid = ParameterGrid(param_grid)\n",
    "experiment_run_ids = []\n",
    "\n",
    "# print(list(grid))\n",
    "\n",
    "for indx, params in enumerate(grid):\n",
    "\n",
    "    prompt_id, prompt_description, prompt_template = params[\"prompt\"]\n",
    "    temperature = params[\"temperature\"]\n",
    "\n",
    "    # Print above parameters, one in each line\n",
    "    # print(f'prompt_id: {prompt_id}\\nprompt_description: {prompt_description}\\nprompt_template: {prompt_template}\\ntemperature: {temperature}\\n')\n",
    "\n",
    "    # Track status\n",
    "    print(\"Running ........\")\n",
    "    print(f\"{indx+1}. {params}\")\n",
    "\n",
    "    # Set up the experiment\n",
    "    experiment_id = f\"prompt-{prompt_id}-{temperature}\"\n",
    "    experiment_desc = f\"Simple language summary with prompt {prompt_id} and temperature {temperature} \"\n",
    "    tags = [\"pubmed\"]\n",
    "    metadata = {}\n",
    "\n",
    "    # print(experiment_id, experiment_desc)\n",
    "\n",
    "    generation_config = {\"temperature\": temperature}\n",
    "\n",
    "    model = GenerativeModel(\n",
    "        model_name=model_name,\n",
    "        generation_config=generation_config,\n",
    "        safety_settings=safety_settings,\n",
    "        system_instruction=system_instruction,\n",
    "        # TODO: Add tools and tool_config\n",
    "    )\n",
    "\n",
    "    # Configure and log prompt\n",
    "    prompt = evals.Prompt(\n",
    "        prompt_id=prompt_id,\n",
    "        prompt_description=prompt_description,\n",
    "        prompt_type=\"single-turn\",  # single-turn, chat,\n",
    "        is_multimodal=False,\n",
    "        system_instruction=system_instruction,\n",
    "        prompt_template=prompt_template,\n",
    "        create_datetime=datetime.datetime.now(),\n",
    "        update_datetime=datetime.datetime.now(),\n",
    "        tags=tags,\n",
    "    )\n",
    "    evals.log_prompt(prompt)\n",
    "\n",
    "    # Configure and log experiment\n",
    "    experiment = evals.log_experiment(\n",
    "        task_id=task_id,\n",
    "        experiment_id=experiment_id,\n",
    "        experiment_desc=experiment_desc,\n",
    "        prompt=prompt,\n",
    "        model=model,\n",
    "        metric_config=metric_config,\n",
    "        tags=tags,\n",
    "    )\n",
    "\n",
    "    # Run Experiment\n",
    "    _experiment_id = re.sub(\"[^0-9a-zA-Z]\", \"-\", experiment_id.lower())\n",
    "    eval_task = EvalTask(\n",
    "        dataset=eval_dataset, metrics=metrics, experiment=_experiment_id\n",
    "    )\n",
    "\n",
    "    experiment_run_name = generate_uuid(_experiment_id)\n",
    "    experiment_run_ids.append(experiment_run_name)\n",
    "    eval_result = eval_task.evaluate(\n",
    "        model=model,\n",
    "        prompt_template=prompt_template,\n",
    "        experiment_run_name=experiment_run_name,\n",
    "    )\n",
    "\n",
    "    run_path = f\"{task_id}/prompts/{_experiment_id}/{experiment_run_name}\"\n",
    "    evals.log_eval_run(\n",
    "        experiment_run_id=experiment_run_name,\n",
    "        experiment=experiment,\n",
    "        eval_result=eval_result,\n",
    "        run_path=run_path,\n",
    "        tags=tags,\n",
    "        metadata=metadata,\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "f1000dbf40f5"
   },
   "source": [
    "- Fetch run details\n",
    "\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "<b> Use <code>evals.get_all_eval_runs()</code> or <code>evals.get_eval_runs(experiment_id=experiment_id)</code> to get run ids.</b>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "f00198617dca"
   },
   "outputs": [],
   "source": [
    "evals.get_all_eval_runs()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "780d6759a46b"
   },
   "outputs": [],
   "source": [
    "# To find run_id for previous runs, see \n",
    "# gemini_evals_plapbook(schema) >> eval_runs(table) >> run_id (column) on bigquery\n",
    "evals.get_eval_run_detail(\n",
    "    experiment_run_id=\"[your_run_id]\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "590c59f4df91"
   },
   "source": [
    "## 🔍 3. Grid search\n",
    "\n",
    "Search the grid for optimal configuration with respect to metrics of choice\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "id": "0db76b6304ac"
   },
   "outputs": [],
   "source": [
    "# Set the task_id to perform the search\n",
    "task_id = \"task_summarization\"\n",
    "\n",
    "# Metrics to be used for grid search\n",
    "opt_metrics = [\n",
    "    \"ROUGE_1\",\n",
    "    \"BLEU\",\n",
    "]  # Options: \"ROUGE_1\", \"ROUGE_L_SUM\", \"BLEU\", \"FLUENCY\", \"COHERENCE\", \"SAFETY\", \"GROUNDEDNESS\", \"SUMMARIZATION_QUALITY\", \"SUMMARIZATION_VERBOSITY\", \"SUMMARIZATION_HELPFULNESS\"\n",
    "\n",
    "# Paramaters to be retrieved from grid search\n",
    "opt_params = [\n",
    "    \"prompt_template\",\n",
    "    \"temperature\",\n",
    "]  # Options: \"experiment_desc\", \"prompt_template\", \"temperature\", \"system_instruction\", \"model_name\"\n",
    "\n",
    "# Use run_ids collected during grid search: experiment_run_ids"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "f4cda73f66aa"
   },
   "outputs": [],
   "source": [
    "# Comparision of runs in experiment grid\n",
    "evals.compare_eval_runs(experiment_run_ids)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "e80f91f765ed"
   },
   "outputs": [],
   "source": [
    "# Outcome of gridsearch\n",
    "evals.grid_search(\n",
    "    task_id=task_id,\n",
    "    experiment_run_ids=experiment_run_ids,\n",
    "    opt_metrics=opt_metrics,\n",
    "    opt_params=opt_params,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "62ea2755c40e"
   },
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "a6966a4d487f"
   },
   "source": [
    "## 🧹 Cleaning up\n",
    "\n",
    "Uncomment the following cells to clean up resources created as part of the Evals Playbook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "id": "bc565a4fd07d"
   },
   "outputs": [],
   "source": [
    "# # Delete BigQuery Dataset using bq utility\n",
    "# ! bq rm -r -f -d {BQ_DATASET_ID}\n",
    "\n",
    "# # Delete GCS bucket\n",
    "# ! gcloud storage rm --recursive {STAGING_BUCKET_URI}"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "name": "2_gemini_evals_playbook_gridsearch.ipynb",
   "toc_visible": true
  },
  "environment": {
   "kernel": "python3",
   "name": "workbench-notebooks.m128",
   "type": "gcloud",
   "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m128"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
